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ASYMPTOTIC BEHAVIOR OF THE UNCONDITIONAL NPMLE 
OF THE LENGTH-BIASED SURVIVOR FUNCTION FROM RIGHT 
CENSORED PREVALENT COHORT DATA 1 

By Masoud Asgharian and David B. Wolfson 
McGill University 

Right censored survival data collected on a cohort of prevalent 
cases with constant incidence are length-biased, and may be used 
to estimate the length-biased (i.e., prevalent-case) survival function. 
When the incidence rate is constant, so-called stationarity of the in- 
cidence, it is more efficient to use this structure for unconditional 
statistical inference than to carry out an analysis by conditioning 
on the observed truncation times. It is well known that, due to the 
informative censoring for prevalent cohort data, the Kaplan-Meier 
estimator is not the unconditional NPMLE of the length-biased sur- 
vival function and the asymptotic properties of the NPMLE do not 
follow from any known result. We present here a detailed deriva- 
tion of the asymptotic properties of the NPMLE of the length-biased 
survival function from right censored prevalent cohort survival data 
with follow-up. In particular, we show that the NPMLE is uniformly 
strongly consistent, converges weakly to a Gaussian process, and is 
asymptotically efficient. One important spin-off from these results 
is that they yield the asymptotic properties of the NPMLE of the 
incident-case survival function [see Asgharian, M'Lan and Wolfson 
J. Amer. Statist. Assoc. 97 (2002) 201-209], which is often of prime 
interest in a prevalent cohort study. Our results generalize those given 
by Vardi and Zhang [Ann. Statist. 20 (1992) 1022-1039] under mul- 
tiplicative censoring, which we show arises as a degenerate case in a 
prevalent cohort setting. 

1. Introduction. Left truncated, right censored data have been exten- 
sively studied in the statistics literature. (See [4] for a list of references.) 
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Their importance stems from the common use of prevalent cohort study de- 
signs to estimate survival from onset of a specified disease (e.g., [13, 25, 26]). 
In such studies patients are identified with prevalent disease at some instant 
in calendar time through a cross-sectional survey. Those so identified are 
then followed forward in time until failure or censoring. Since the possibly 
censored observed survival times are generated from prevalent cases, they 
are left truncated. Failure to account for left truncation can lead to sub- 
stantial overestimation of the survivor function. Indeed, Wolfson et al. [28] 
showed that survival with dementia from onset had almost certainly been 
overestimated because of the failure to take left truncation into account. In 
fact, their adjusted (for left truncation) estimated median survival time was 
3.3 years versus 6.6 years for the unadjusted estimated median survival. 

When the left truncation time distribution is not specified, the approach 
to estimating the unbiased survival function is to condition on the observed 
truncation times (e.g., [1, 27]). However, when there is good reason to as- 
sume that the initiation times follow a stationary Poisson process which 
implies more structure on the truncation times (the so-called stationarity 
assumption, [2]), this special structure may be exploited. Under stationar- 
ity, it is not necessary to condition on the observed truncation times and, 
instead, the natural estimator is the unconditional nonparametric maximum 
likelihood estimator (NPMLE) [4]. Wang [25] had suggested that an uncon- 
ditional NPMLE of the unbiased survivor function is more efficient than 
its conditional counterpart, under stationarity. This improvement in effi- 
ciency was later confirmed by Asgharian, M'Lan and Wolfson ([4], Figures 
3 and 4, but note the incorrect captions). We shall, in the sequel, reserve 
the terminology "length-biased" for left truncation under the stationarity 
assumption. 

Several authors have discussed maximum likelihood estimation in the 
presence of length-biased data [10, 11, 21, 22]. The latter two papers treated 
the question in the general setting of selection bias, though without allowing 
for censoring. 

This paper establishes the asymptotic properties of the nonparametric 
maximum likelihood estimator (NPMLE) of the length-biased survival func- 
tion when the observed data are length-biased and right censored. Now, 
while the length-biased survival function is itself of little direct interest, the 
unbiased survival function, which is simply related, is of central importance 
in a survival analysis based on prevalent cohort data. By exploiting the 
mapping which relates the length-biased with the unbiased survival func- 
tion, one may use the results established here to obtain the asymptotic 
properties of the NPMLE of the unbiased survival function [4]. That is, we 
present here, for the first time, the foundation upon which the asymptotic 
inference described in Asgharian, M'Lan and Wolfson rests. It should be 
pointed out though, that the definition of Q2 given above equation (8) in [4] 
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is erroneous, though with only minor effect on the main result. This error is 
corrected here. 

A subtlety missed by several authors is that the Kaplan-Meier estimator 
is not suitable as the NPMLE of the length-biased survival function since 
the right censoring induced by the sampling scheme is informative; in order 
to be censored, one's failure time must be longer than one's truncation time, 
that is, be observable. (See [4] for further details.) 

Our line of attack is similar to that of Vardi [23] and Vardi and Zhang [24] , 
who derived the NPMLE of the length-biased survival function and estab- 
lished its asymptotic properties under multiplicative censoring. They pointed 
out that the likelihood obtained under multiplicative censoring has the same 
form as the likelihood obtained from prevalent cohort study data with follow- 
up, when the stationarity assumption holds. Importantly, Vardi [23] noted 
that although the maximum likelihood estimates obtained from these com- 
mon likelihoods are the same, the asymptotic properties depend on the sam- 
pling mechanism that gives rise to the data and must be established afresh 
in each setting. 

It is therefore instructive to place multiplicative censoring in the context 
of prevalent cohort studies in order to underscore the differences between 
the two schemes. We also derive the likelihood conditional on the number of 
censored observations, not because such conditional inference is carried out 
in practice, but merely to contrast this with the unconditional likelihood. 
Consider, therefore, the following three situations: 

(i) The number of subjects identified at recruitment is k. All subjects 
who are not lost to follow-up are followed until the end of the study period. 
Those subjects who are lost to follow-up or survive to the end of the study 
are right censored, the remainder having failed in this time period. It is 
assumed that the censoring of the residual life times (also called the forward 
recurrence times) is random. That is, the times from recruitment until failure 
are randomly censored. If M denotes the (random) number of uncensored 
subjects at the end of the study period, then N = k — M denotes the number 
of censored observations. 

(ii) The scenario is the same as that of (i) except that M and N are 
fixed at the observed values, m and n, respectively, and analyses are carried 
out conditionally. 

(iii) The number of subjects identified at the cross-sectional stage is 
k = m + n. At this stage, a fixed number n are immediately censored. The 
remaining m are followed until failure. It is easily seen that this sampling 
scheme is equivalent to that of multiplicative censoring. 

The setup described by (i) occurs in practice most frequently, and is the 
focus of this paper. In fact, in Section 7 we show explicitly how sampling 
scheme (iii) arises as a particular, although degenerate, case of scheme (i). 
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Under (iii) censoring is precluded after recruitment, which is clearly an un- 
realistic assumption in a prevalent cohort study with follow-up. See [23] 
though for examples of multiplicative censoring in different contexts. Sec- 
tion 7 contains further discussion on schemes (ii) and (iii). 

The above generalities are perhaps better understood through The Cana- 
dian Study of Health and Aging (CSHA), a large prevalent cohort study with 
follow-up conducted to investigate, primarily, various aspects of dementia in 
the elderly Canadian population. 

Briefly, during a six-month period in 1991 roughly 10,000 Canadians 
over the age of 65 were recruited and screened for prevalent dementia. De- 
mentias considered included mainly probable Alzheimer's disease, possible 
Alzheimer's disease and vascular dementia. At the time of diagnosis, age at 
onset was ascertained from the patient's caregiver. Those subjects diagnosed 
with dementia in 1991 were followed until censoring or death. Follow-up 
ended in 1996, and subjects who were still alive were deemed to have been 
right censored. Very few subjects were lost to follow-up (also considered to 
be right censored) between 1991 and 1996. Times of death from any cause, 
or of censoring, were recorded for all subjects diagnosed with dementia. 
(See [28] for further details.) 

One of the many aims of the CSHA was to estimate the unbiased survival 
function of subjects with incident dementia, where the origin was date of 
onset and the endpoint was death from any cause. The data available for 
this estimation problem had several features: (a) They were left truncated 
because subjects with dementia were identified as prevalent rather than in- 
cident cases, (b) The underlying process that generated the onset times of 
dementia was thought to have been roughly stationary in the sense that 
the intensity function of the initiation process was constant; the basis for 
this assumption is discussed by Asgharian, M'Lan and Wolfson ([4], Fig- 
ure 5 (note the incorrect caption)) and Asgharian, Wolfson and Zhang [2]. 
The observed survival times were, therefore, length-biased, (c) The interval 
from onset to recruitment, the current life time (also called the backward 
recurrence time), as well as the minimum of the interval from recruitment 
to death, the residual life time and censoring, were recorded. That is, there 
was more information than that contained in the length-biased, possibly 
censored, survival times alone, (d) The censoring of the residual life times 
was random. 

Inference about the length-biased survival function from onset with de- 
mentia could be based on the current life times together with their possibly 
randomly censored residual life times, as was done by Asgharian, M'Lan and 
Wolfson [4]. Alternatively, with the same data, though less conventionally, 
one could have conditioned on the observed number of censored subjects and 
those who died, so that all ensuing inference would have been conditional 
[see (ii)]. 



NPMLE OF THE LENGTH-BIASED SURVIVOR FUNCTION 



5 



It is of interest to note that length-biased sampling also arises when a 
renewal process is sampled at some point t. The interval surrounding t has 
the density g(x) =xf(x)/nx, where / is the density and fix the (finite) 
mean of the sojourn time distribution. See [2] for the differences between the 
two settings and a characterization of stationarity. Other sampling schemes 
have been discussed in the literature and can be depicted using the Lexis 
diagram [7, 17, 18], and it is possible to carry out inference from a prevalent 
cohort study under appropriate restrictions when there is no follow-up [15, 
16, 20]. 

The layout of the paper is as follows: In Section 2 we present the likeli- 
hoods for the different sampling schemes. Thereafter, we focus on sampling 
scheme (i). A general overview of the proofs is given in Section 3. In Section 4 
we establish uniform consistency of the NPMLE, and in Section 5 we discuss 
weak convergence of the NPMLE. Asymptotic efficiency of the NPMLE is 
presented in Section 6. In Section 7 we return briefly to schemes (ii) and 
(iii), expanding on their relationships to scheme (i). Section 8 summarizes 
our results and contains some concluding comments. 

2. Preliminaries and the likelihoods for different sampling schemes. 

2.1. Preliminaries. While "stationarity" refers to the pattern of chrono- 
logical cross-sectional sampling, all of the notation below relates to current- 
age (current life time), failure and censoring durations for the individual 
subjects sampled. Suppose that associated with each subject in a target 
population we have a triple (X' ,T' ,C), where X' represents the failure 
time, X" the truncation time and C the censoring time. Often a reasonable 
assumption is that X' is independent of (T',C), while P(C >T') = 1 [25]. 
In a cross-sectional survey subjects are observed only if X' > T' . Under the 
stationarity assumption, the survival time density of the observed subjects, 
the length-biased density, say g, is related to fx', the unbiased density, 
through the equation 

/ \ f ( rpl\ Xfx'(x) 

9{x) = fx'\x'>T'{x\X >T) = . 

fix' 

We give, initially, the likelihoods derived under (i), (ii) and (iii), in order 
to emphasize the differences between the three situations. Although the 
likelihoods are similar, the appearance of the random censoring indicator 
under scheme (i) (M random) requires special treatment in the derivation 
of the large sample properties of the maximum likelihood estimator. These 
properties are established in Sections 4, 5 and 6 under scheme (i). 

None of the three likelihoods below depends explicitly on the residual 
and current life times separately. However, the derivations of these likeli- 
hoods depend explicitly on knowledge of the current life times, as well as 
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the randomly right censored residual life times. These are the data typically 
observed in a prevalent cohort study with follow-up, when the times of onset 
are known. 

Associated with each observed subject in a prevalent cohort study, we 
have a triple, 

(j4j, .Rj A Cj, <5j), i = 1, 2, . . . , k, 

where A{, Ri and C are, respectively, the current-age, the residual life time 
and the residual censoring time for the ith observed subject. The indicator 
function 5j is the censoring indicator, that is, 

1, if the ith subject is not censored (Ri < Cj), 
0, otherwise. 

It is reasonable in many applications to assume that Cj is independent of 
(Ai, Ri). We adopt this assumption in the sequel. The vectors (Aj, Ri ACj, <5j), 
i = l,2,...,k, are also assumed to be independent. 

Note that the failure time and censoring time associated with each ob- 
served subject are, respectively, X' = A + R and Y' = A + C. One can 
therefore easily show that, in general, if C is independent of (A,R), then 
Cov(A, Y) = a\[l + p A ,R<T R /a A ], where a\ = Var(A), o\ = V&r(R) and p A ,R = 
corr(^4,i?). Thus, except for trivial cases, failure times and censoring times 
are positively correlated under stationarity, since stationarity implies A is 
conditionally Unif(0,X') given X' , so that a a = ox'-A = o~R- This then im- 
plies that the censoring mechanism in the setting under study is informative. 

Under the stationarity assumption, 

f fx>(a + r) .. „ 
(2-1) f A>R M= ' lfa ' r> °' 

[ 0, otherwise, 

which corresponds to the well-known expression for the joint density of the 
current and residual life times, respectively, of a renewal process (see [9, 
23]). In the sequel we use fu for fx', where U in the subscript stands 
for "unbiased." Using (2.1), one can easily derive the distribution function, 
say G, of X = A + R, the length-biased survival time, whose density is 

(2-2) ,(x) = ^). 

pu 

Let 

G*(t) = P{A + R<t\5 = l), 
with density function (t) . We then have 

1 f* 

g*(t) = - fA,R{t-r,r)S c {r)dr 
pJo 
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(2.3) PJ ° " U 

_ fu(t) ? 



Pfj-u Jo 
g(t) * 



Sc (r) dr 



S c (r)dr, 

pt Jo 

where p = P(5 = 1) = P{R < C) and S c {r) = 1 - F c {r) = 1 - P(C < r). 
Suppose = P(A -+- C < t\S = 0), with density function /*(t). Then 

/■t ;>oo 

/*(*) = ! / / fA, R it-c,r)drdF c {c) 



1 -p Jo 
1 /"* Z" 00 fu{t + r-c) 



(2.4) 



1 - P Jo Jc fJ>U 



drdFc(c) 



(l-p)Hu Jo J t 
Su(t)F c (t) f(t)F c (t) 



fu{u) du dF c {c) 



where 



Hu(l-p) l-p 
nu Jt 



Suit) 
fj-u 

is the residual lifetime density. We turn now to the derivation of the likeli- 
hoods under the schemes (i), (ii) and (iii) of Section 1. 



2.2. Random censoring (M random). This is the case in which M and 
N = k — M are random and arises under situation (i) of Section 1. The 
observations comprise 

(Ai,RiACi,Si), i = l,2,...,k. 

Let UC and C denote, respectively, the sets of indices of the uncensored 
and censored observations. Let m, T{ and Cj denote, respectively, the realized 
values of Ai, Ri and Cj, and let Xi = en + r*, yj = aj + Cj, and z = aj + r, for 
i 6 IAC and j G C. The likelihood is 

£w = ( I] fA,R{a h n) J ( J] dP^i^ > c,-) J 
Viewc / VjeC / 

usins oc (2 - 1) f n «)(n / m^+i)^ 



(2.5) 
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( II dG( Xi ))(l[ f z- l dG{z) 
\ieuc / \jec y j- z j 

Y[(dG(xi)) Si ( f z- l dG{z) 



i=l 

which has a form different from the likelihood that leads to the Kaplan-Meier 
estimator in the presence of randomly right censored data. (See, e.g., [14], 
page 15.) 

2.3. Random censoring (conditional on M). Here the data arise as in 
Section 2.2, but all analyses are carried out conditional on M = m and 
N = k — m = n. The "effective" observations comprise, therefore, 

(Ai,Ri) ~ fA,R\s=u i = l,2,...,m, 

and 

(Aj,Cj) ~ fA,c\s=o, j = 1, 2, . . . , n. 
The likelihood contributions are 

f A>R {A = a,R = r\5=l) = f A , R {a, r\R < C) 

= -Sc{r)f A ,R{a,r) 
P 

_ Scjr) fu{a + r) 
P fJ-u 
S c (r 



p(a + r) 
and 

f A ,c(A = a,C = c\5 = 0) = f AtC (a, c\R > C) 

1 



g(a + r) 



fc(c) I f AtR (a,r)dr 
dr 



1 — p 

fc(c) rfu{a + r) 



1 - P Jc fJ-U 

M£) / z -i dG{z) . 

I—P Ja+c<z 



Thus, the likelihood 



\i=i pXl ) \j=i J yj< z 
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\i=i J \j=i J y 



z~ x dG{z) 

V]< z 



2.4. Multiplicative censoring. The scenario described in (in) of Section 1 
is equivalent to Vardi's [23] scheme of multiplicative censoring. In the context 
of a prevalent cohort study, multiplicative censoring is induced by defining 
the distribution function of C, the residual censoring time, as 

TO, if t < 0, 

(2.6) F c (t) = ll-p, if0< t<r, 

[l, ift>r, 

where r = inf{i:G7(t) = 1}. If the residual censoring distribution is given 
by (2.6), then /* = / and g* = g. Vardi and Zhang [24] considered a sequence 
of {Fq} with pk = m/k, so that they essentially conditioned on the censoring 
proportion. 

3. Asymptotics: general overview and master equation. The discussion 
in Sections 3, 4, 5 and 6 is restricted to sampling scheme (i). Let G be the 
maximizer of the likelihood C-ji [see (2.5)] with respect to G. In this section 
we present the master equation, and outline the main steps in establishing 
the uniform strong consistency of G and show that U mn , defined below, 
converges weakly to a Gaussian process. The details are given in Sections 4, 
5 and the Appendix. 

Define 

U m , n = Vk(G-G), 
W x ,m = Vm(G m - G*) 

and 

W Y , n = Vn~(F n -F*), 

where m = J2i=i $i an d n = k — m are realized values of M and N, respec- 
tively, and G m and F n are the respective empirical distribution functions of 
xi, . . . , x m and yi, . . . , y n . Let p = Pk = ^ and let t± < ■ • ■ < be the distinct 
values of x\, . . . ,x m and yi, . . . , y n . 

The derivation of the asymptotics begins with the score equation derived 
from the likelihood C-r. The NPMLE must satisfy the score equation 

(3.1) olG{t)=pdG m {t) + {l-p) I - - r x dG(t), 

Jo<y<t $ y < z z~ l dG{z) 

subject to J2j=idG(tj) = 1 and dG(tj) > 0, j = l,...,h ([23], page 754). 
Integrating both sides of (3.1), we obtain 



(3.2) G(t)=pG m (t) + (l-p) f \[ 

J0<x<t Uo 



dF n (y) 



o<y<x $ y < z z- x dG{z) 



x~ L dG(x) 
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where the final integrand is defined to be for x>th- 

Our first objective is to use (3.2) to provide an explicit linear mapping 
on a function space Z?o[0,i] (see Section 4 for precise definitions) expressing 
an explicit linear functional of U m>n approximately as a linear functional 
of Wx,mi Wy,n and p — p. The linear functional of U m>n is shown to be 
boundedly invertible, and the resulting expression for U m ^ n is used to prove 
uniform consistency and efficiency for G and weak distributional convergence 
for XJ m ,n- 

Lemma 1 (Master equation). Let 



fit) 



(3.3) 

and 

(3.4) 

Then 



t<z 



z^dG(z), 



w m Jt) =p 1/2 w x , m (t) + (i -P) 1/2 f(t) 



0<y<t 



w Y , n (y)d 



f(y) 



V m ,n(t)=W m! n(t) +p 



1 — p 



1/2 (G, W -G( t ))^4. 

vW ~P) 



P-P 
1 — p 



+ 



U m ,n (t) 

P-P 
l-p 



(3.5) 



x <p 



9*{x) 
o<x<t g(x) 



dU m , n (x) 



(l-p)/ »(f ^P-Ai 

J0<y<t \Jy<z Z z J 



V/(y) 



My) 



f(y) 



Vm,n (t) 5 



where m = Ya=i n = k — m and p = P{5 = 1) = P(R < C). 
PROOF. See the Appendix. □ 

Lemma 1 relates U m>n to the empirical processes Wx,m and Wy n , which 
are indexed by the realized values of the random integers M and N. Equation 
(A. 5) shows that the process V mtn (t), given by (3.4), can be expressed as 
the image of a linear operator applied to U mn . To see this, define 



(3.6) G k ,i{u){t)=p 



1 



P 



9*{x) 

l-pj Jo<x<t g(x) 



du(x) 
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«M«)(*) = (1-p)(M) 

(3.7) 



v([ u ^d z]d 

0<y<t \Jy<z Z 



(f(t) {\f*(y) 



\f(y) ) f(v) 



and 



(3.8) W*(«)(*)=(jzf )«(*)■ 



Define 



and express as 



Qk = Gk,i + Gk,2, 
T] t = Hk + Qk- 



Then we may write 

(3.9) F k {U m ,n)=V m ,n- 

It is clear that Qki, Qkii H-k and, thus, J~k, are linear operators. 

4. Uniform consistency of G. To study the properties of J-^, we first 
need to determine the space on which acts. Let Do[0,t] be the space of 
all cadlag functions u(-) on [0,t] that vanish at 0. The space -Do[0, t] endowed 
with the uniform topology, the topology induced by the uniform norm, ||u|| = 
sup 0<s<t |n(s)|, is a Banach space. This implies that £(Do[0,t],Do[0,t\), the 
space of bounded linear operators on Dq [0, t] , is a Banach algebra. The other 
fact about -Do[0,t], endowed with the uniform topology, that we need in the 
sequel is that cadlag functions have countably many jumps [19] . This guar- 
antees that cadlag functions are Riemann integrable on bounded intervals. 

Define r = inf{i : Git) = 1}. Let r < oo and 



where 



a(t) = - f Sc(s)ds = ^$- and (3 = F C (0)>0. 
t Jo git) 



tJo w g{t) 



We note that a(t) is a decreasing function with limt—,o a(t) = 1-/5. Thus, 
(3 < 1/2 if t G J . It is also easy to see that a sufficient condition for t G J 
is Fc(t) < 1/2, since a(t) > Sc(t). For the interpretation of this condition, 
see [4] . The condition on t is somewhat less restrictive than that given in [4] . 
See also Section 8 for further comments. 
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Lemma 2. If t € J , then a.s. for all sufficiently large k, 

(a) T k is a bounded linear operator on Do[0,t], and 

(4.i) K ,|<Jpi + i^ ; 

1 — p 1 — p 

(b) T k is an invertible linear operator on Do[0,t], and 

(4-2) \\^\\< ^ X{t) , 

l-X(t)\p-p\/(l-p) 

where 

((l-p)/(l-$))(2/a(*)-l/(l-0)) 



X(t) 



l-(2/a(t)-l/(l-/3))/3 



PROOF, (a) Define A k (u)(s) = f(s) J* y f y ^ dzdjfo. Then \\A k \\ < 

1 and, therefore, ||<5fc,2|| < Z^T^p' v * a ® n ^ ne °ther hand, using integra- 

tion by parts and since 

t x P9*(x) [1-/9, as x^O, 
a(x) - 



g(x) { 0, as x — > oo, 

we have < (1 — /3)yz^- This completes the proof of part (a), 

(b) We have that Gk,i is invertible and 

G- k }{u){s) = \^-( _ -^du(x). 



1 -pJo<x<spg*{x) 

Using integration by parts and as a(x) is a decreasing function, we have 
< jEjj (^y ~~ TT/j) a,s - Since £(-Do[0,t],-Do[0,£]) is a Banach algebra, 
Q k is invertible a.s. for large k. In fact, 

Qk = Gk,i(I + G k \Gk?) 

and thus 

Gk 1 = ( I + Gk,iGk,2)~ 1 Gk}, 

which implies that 

II^HIKi + ^fer 1 !!!!^!! 

((1 _ p) /(i - p))(2/q(t) - 1/(1 - /?)) _ t 

" l-(2/a(t)- 1/(1 -/?))/? " A(t) a - S -' 

since ]3— > p a.s. as k gets large. Having established the invertibility of Q k , 
we have 
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Using the strong consistency of p and the fact that £(Do[0,t], Dq[0, t]) is a 
Banach algebra, we obtain once again 



and also 

im _1 |l<IIU + ^ 1 ^)" 1 |lll^ 1 | 
ll^ll 



< 



< 



a.s. 



A(t) 



a.s. 



i - II^IIIp-pI/C 1 -p) 1 - Kt)\P-p\/(i -p) 

This completes the proof for part (b). □ 

Theorem 1 below and its corollary prove the uniform strong consistency 
of G. 

Theorem 1. Let G be the NPMLE of the continuous lifetime distribu- 
tion function G, and t G J . Then 



(4.3) \\G - G||oo = sup \G(s) - G(s)\ = O I 

0<s<t 



i log log k 



k 



a.s. 



\G — GHoo < ^ 1 1 



Vm.n 



PROOF. Using Lemma 2 and (3.9) 

Tr l \\ I 
2/a(t)- 1/(1-0) 



On the other hand, 

limsup H^T 1 1| < A(t) = — 

k-^oo 1 

It therefore suffices to show that 



(2/a(t)- 1/(1-/3))/? 



as k — > oo. 



(4.4) 



Vrr, 



Vk 



o 



' log log k 



k 



a.s. 



Next, using (3.3), we have 

\W m , n {t)\ <p 1/2 |Wx,mW| +(l-p) 1/2 ||WV,n||oc 

On the other hand, using the law of the iterated logarithm (LIL), 



1-p 



(G.(t)-G(t)) 



p — p 



VpO-p) 



01 



' log log k 



k 



a.s. 



To complete the proof, we need to show that 
(4.5) ||G m - = 0| 



log log k 



k 



a.s. 
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I Fn F* 1 1 oo 



o 



' log log k 



a.s. 



To establish (4.5), one may either use the Kolmogorov exponential bounds 
and Borel-Cantelli lemma or, by using the LIL, argue as follows. Splitting 
one of the sums into two parts, we have that 



1 

kp 



| [kp] — m\ + mO I 



' log log k 



which implies 
(4.7) 



I G m Gjcp 1 1 oo 



O 



' log log k 



k 



k 



a.s. , 



where Gtp is the empirical distribution function of x%, X2, ■ ■ ■ , x\kp] ■ Now, 
using the triangle inequality, 

(4.8) ||C m G^Hoo ^ ||G m G^plloo + ||Gfcp GjhIIoo- 

Thus, (4.5) follows from (4.7), (4.8) and the fact that 



Gkp G* 



O 



' log log k 
k 



a.s. 



Likewise, one can establish (4.6). This completes the proof. □ 

Equation (4.3), established by Theorem 1, tells us how fast G converges 
to G in the supnorm topology. Strong consistency of G may therefore be 
stated as a corollary to Theorem 1. 



Corollary 1. Suppose Fjj is a continuous lifetime distribution func- 
tion. Let G be the length-biased distribution function of Fjj given by (2.2) 
and t £ S- Then G, the NPMLE of G based on data collected according to 
sampling scheme (i), is uniformly strongly consistent on [0,t]. 

5. Weak convergence of U m ^ n . To establish the weak convergence of 
U m ,n to a Gaussian process, we first need to prove the following lemma. 

Lemma 3. IfteJ, then 

||.Ffc(«)-.F(u)||oo->0 a.s. Vu6l>o[0,f], 
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where 

(5.1) F = Gi+G2, 

(5.2) Gi{u){s)=p( 9 -4^du{x) 



0<x<s 



g(x) 



and 



,5,, ^w.a-^,^^,,).^-,)^ 

Proof. Using the law of large numbers and the bound 

i\<u n \P~P\ 

\\n k \\< a.s., 

1—p 

we have \\Hk\\ ~^ a.s. as /e — > oo. It is also easily seen that 

1—p 



\\Gk,i-Gi\\ < 



1 



a.s. as k —> oo. 



1—p 

To complete the proof, we need to show that 

\\Gk,2(u) - ^2(w)||c» — ► a.s. as/c^oo 

for all u € Z?o[0,i]. This can be done along the lines of Lemma 2 of [24]. We 
therefore omit the proof. □ 

Theorem 2. Suppose Fjj is a continuous life time distribution function. 
Let G be the length-biased distribution function of ' Fjj given by (2.2) and let 
G be the NPMLE of G. Then for any t G J, 

U rn , n = Vk(G-G)^U = F~ 1 {V) inD [0,t], 

where J-~ l is the inverse of J- given by (5.1), (5.2) and (5.3), 

V{s)=p l l 2 B l {G*{s)) + {l-p) 1 l 2 f{s) [ B 2 (F*(y))d-}- 

Jo<v<s fly) 



+ p( kT ^) 1/ \g*(s)-G(s))Z, 



\-p, 

Z~iV(0,l), and where B\ and B 2 are independent Brownian bridge pro- 
cesses, independent of Z. 

Proof. We first need to show that T is invertible. This is done using 
a similar argument to that used in Lemma 2. It is also easy to see that 
\T~ X \ < A(t), where A(t) = rj^^pr/jl^g - In view of Theorem 7.3.2 
and 7.3.3 of [8], the limiting processes of W m>n and Wrfcpljjfcfi-p)] are the 
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same. Hence, we need only find the limiting process of W\j. p ]ik(i- P )](s) + 

p(i^) 1/2 (GJs) - G(s)) ^if-P) . Theorem 2 of [24] may now be used to 
p Vp(i-p) 

complete the proof. □ 

6. Asymptotic efficiency of G. It transpires that, under scheme (i), G 
is asymptotically efficient in the class of regular estimators whose finite- 
dimensional limiting laws are continuous in G. This result is perhaps not 
unexpected, given that Vardi and Zhang [24] have established the asymptotic 
efficiency of G under multiplicative censoring, which we show in Section 7 is 
a special case of scheme (i). Since the proof under scheme (i) mimics that of 
Vardi and Zhang, we omit the details. A systematic account of asymptotic 
efficiency and the convolution theorem can be found in [6, 12]. Here we follow 
the approach taken by Vardi and Zhang [24] and confine our attention to 
regular estimators whose finite-dimensional limiting laws are continuous in 
G. 

Let H(-) be a stochastic process in Dq[0, t]. The distribution of H(-) 
in Do[0,r] and the fc-dimensional joint distribution of (H(si), . . . ,H(sk)) un- 
der the probability Pq will respectively be denoted by C(H; G) and C(H; G, si, 
. . . , Sk)- Let v be a measure on [0, oo) with respect to which the distribution 
G has a density g. Let J-{y) denote the set of all densities with respect to 
v. Let C(g,s) be the set of all sequences of densities {gk £ ^{^)} such that 

(6.1) lim \\k 1 /\g 1 J 2 - g y' i )- q \\ 2 = ^ 

where ? £ L 2 (v) and || • H2 is the L2{v) norm. The limit in (6.1) implies in a 
standard way that q -Lg 1/2 . Let C(g) = U^eLaH^ 1 / 2 C ^^)- 

Suppose {gk} G C(g) is an arbitrary sequence with corresponding c.d.f.s 
{Gfe}. Following Beran [5], we say that a sequence of estimators Gk is regular 
at g if 

C(Vk(G k -G k );G k )^C(U;G) in D [0,r], 

where C(U;G) depends only on g and not on the choice of the sequence 
{g k } sC(j) which determines the sampling scheme. Theorem 3 below estab- 
lishes superiority of the NPMLE over all regular estimators whose finite- 
dimensional limiting laws are continuous. The proof is similar to the proof 
of Theorem 3 of [24] and, therefore, is omitted. 

Theorem 3. Let p > and G be a sequence of regular estimators with 
a limiting law C(U ; G) whose finite- dimensional laws C(U; G, s\, . . . , Sk) are 
continuous in G under the supnorm topology for G. Then there exists a 
stochastic process H(-) in Do[0,t], t^J, such that 

C(U;G) = £(H;G)*£{U;G), 

where U is as in Theorem 2 and "*" denotes the convolution. 
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7. The other sampling schemes. This section has two purposes: the first 
is to indicate briefly how the asymptotics might be established under scheme 
(ii), so that the case of random M and fixed M = m may be contrasted. The 
second is to demonstrate explicitly how multiplicative censoring [scheme (iii)] 
may be regarded as a special case of scheme (i). 

Sampling scheme (ii): Random censoring (conditional on M). Under 
sampling scheme (ii), the proportion of uncensored observations is fixed. 
Assuming that p = p, that is, conditioning on the proportion of uncensored 
observations, TLk given by (3.8) vanishes, Qk,i given by (3.6) reduces to 
Qi given by (5.2), while Qhi given by (3.7) remains unchanged. Also, the 
second term on the left-hand side of (A. 5) vanishes when we condition on 
the proportion of uncensored observations. We therefore obtain the following 
master equation for scheme (ii): 

f ' k(Um,n) — Wm,ra> 

where 

T fc = Qi + g k: 2, 

and W m;n is given by (3.3). It then follows from the results in Sections 
4 and 5 that, under sampling scheme (ii), G is uniformly strongly consistent 
and 

U m>n ^U = T~ l {W) mD [0,t]Vt£j, 

where 

W(t)=p 1 ' 2 B 1 (G.(s)) + (l-p) 1 ' 2 f(s) I B 2 (FM)d 1 ^ 

J0<y<s }{y) 

and T" 1 is the inverse of T = Q\ + Q2, where Q\ and Q2 are, respectively, 
given by (5.2) and (5.3). 



Sampling scheme (iii): Multiplicative censoring. Having assumed (2.6) 
as the residual censoring distribution and by conditioning on the censoring 
proportion, 7ik vanishes, while Gk,i( u )(t) an d Qk,2(u)(t), respectively, reduce 
to pkl and (1 — pk)Aj, where / is the identity map and 

Putting the above reduced forms together, we obtain the following master 
equation for sampling scheme (iii): 
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where = pkl + (1 — pk)Aj. It then follows from the results in Sections 

4 and 5 that, under the multiplicative censoring scheme [scheme (hi)], G is 
uniformly strongly consistent and 

U m ,n^U = Q- 1 (W) in Do [0,*], 

where 

W(t)=p 1 ^B 1 (G(s)) + (l-p) 1 / 2 f(s) [ B 2 (F(y))d-±- 

J0<y<s j{y) 

and is the inverse of Vl/ = pi + (1 — p)Af, if p = lim^^p^ > 0.59. 



8. Concluding remarks. We have proved that, for length-biased right 
censored prevalent cohort survival data with follow-up, the NPMLE of the 
length-biased (i.e., prevalent case) survival function is strongly uniformly 
consistent, converges weakly to a Gaussian process, and is asymptotically 
efficient. It can be shown [4] that the NPMLE of the unbiased (i.e., inci- 
dent case) survival function inherits these properties. The approach taken 
here is based on that used by Vardi and Zhang [24] , although their methods 
do not carry over to the current more general setting without substantial 
modification, owing to the random censoring of the residual lifetimes. An 
apparently essential condition imposed for establishing the asymptotic re- 
sults in Sections 4, 5 and 6 is that t £ J . This condition is not restrictive 
since, in practice, f3, the mass of the residual censoring distribution at 0, 
would be very small. For instance, if f3 = 0.01, then a sufficient condition for 
t e J is that F c (t) < 0.98. 

In view of the fact that the current and residual lifetimes are equally dis- 
tributed under stationarity, (3 represents the proportion of uncensored obser- 
vations with missing onset time. This then means that the results presented 
here address three of the four possible cases, that is, censored/uncensored 
and with/without onset time, in the setting considered in this paper. 

If we allow an arbitrary unspecified incidence process, then the model 
becomes nonidentihable and nonparametric estimation must be conditional 
on the truncation times, an approach that is commonly used because of 
its robustness against departure from stationarity of the incidence process. 
Wang [25], however, points out that this approach is only justified as condi- 
tional maximum likelihood if all censoring times are known, even for those 
who fail before they are censored. When the intensity of the incidence pro- 
cess is known, one can mimic the proofs given here to establish asymptotic 
results. This, however, entails a new master equation and, therefore, new 
subsequent steps. 
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PROOF of Lemma 1 (The master equation). Let A = \pU m ^ n (t) — W m ^ n {t)]/ 
Vk, where p = ? . Thus 



n 



A=p(G(t) - G m (t)) - ^f{t) I W Y , 



(y)d(j^)+p(G*(t)-G(t)). 



Now using the equation (from [24], page 1034) 
fit) 



n J0<y<t 



W Y , n {y)d(J-] = f f 



o<y<x j y ^ z z- 1 dG{z) 



obtained via integration by parts and a change of order of integration, we 
have 



A=p(G(t)-G m (t))-(l-p) 

+ p(G*(t)-G(t)). 
Using (3.2), 

A = pG(t) + (l-p) 



0<x<t J0<y<x J y<z Z" 1 dG(z 



dF*(y) 



I0<x<t 

-G(t)+p(G*(t)-G(t 
(1-p) 



0<x<t 



o<y<x J y ^ z z^dG(z) 



x- x dG(x) 



o<y<x J y < z z- l dG(z) 



dy 



x^dGix) 



+ p(G*(t)-G(t)) 
Utilizing (2.4), we obtain 
dF*(y) 



0<y<x 



J y < z z^dG(z) 



dy 



0<y<x 
1 



F c (y)f(y)dy/(1- P ) 
J y < z z-idG(z) 



dy 



1-P J0<y<a 

We also have 

F c (y)f(y)-(i-p)f(y) 
f(y) 



F c (y) f -Sr-dy-{l-p)dy 



Fc(y) S y < z z" 1 dG(z) - (1 - p) f y < z z- 1 dG(z) 
J y < z z- l dG(z) 

f y < z z- 1 dU m , n (z) S c (y) f y < z z- 1 dG(z) 
VkJ v<g ^dG(z)~ J v<z z^dG(z) 



+ p. 
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Thus, 



A 



(i-p)/(i-p) 



0<x<t 



Iz>y Z 1 dU m ,n{z) 

o<y<x J^yZ^dGiz) 



dy 



X - 1 dG{x) 



(A.l) 



+ {pG*(i) 
1 — p 



1 — p 



0<x<t 



0<y<x J^yZ^dGiz) 



dy 



x -1 dG(x] 



^G(t)-G(t) 



1 — p 
= 1 + 11 + III. 

We simplify the terms /, II, III in (A.l). First, as in [24], page 1035, 

r V-M^-d-mtv! ^dzd(J-). 

JKJ Jo y Jz>V Z* \f(y)J 



Next, in //, after substituting for G*(t) = Jq g*(x) dx, using (2.3) and replac- 
ing dG in the inner integral by dG — /c -1 / 2 dU m ^ n , we have 



II 



t r rx p_ p 



+ 



while 



o l-p 
1 l-p rt 
y/kl-pjQ 
1 l-p rt 
y^l-pji 



III=p 



Sc{y)dy 



x~ l dG{x) 



Sc(y)dy 



S c{y)I z > y Z 1 dU mtn (z) 

° ' f z > y z~idG(z) 

1 U m , n {t)- P -^G{t) 



dy 



1-pVk ' 1-P 
Now, combining the above simplified forms for /, II , III, we obtain 

pU m ,n(t) ~ W m) n(t) 

l-p' yl h y l*>, z' \f(y)J 
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+ 



f Vkjp-p) r* 
I l-p Jo 
1 — p 



Sc(y)dy 



+ 



+ P 



1 p Jo 
l-p rt 

l-p Jo 

1 — p 



Sc(y)dy 



s c(y)J z > y z 1 dU m , n {z) 
lo ' J z > y z-idG(z) 



djj 



X - l dG(x) 



1 — p 



i — p 



Thus, 



U m ,n\Z) , , 1 

4 — dzd- 



+ 



(A.2) 



1~P 
l-p Jo 

1 — p 
l-p Jo 



Sc(y)dy 



Jz>y Z* 

x dU m ^ n (x^j 



s c(y)J z > y z 1 dU m , n (z) 



dy 



x^dGix) 



W m)n {t)+pJ-^{G^t)-G{t)) y 



1 — p 



Using the equation 

-r{y \ z~ 2 U mtn (z) dz 
dy\ Jz> y 



Vp{i-p) 

U m ,n( z ) dz^ 1 - y^Um^y) 

z>y 

Z~ X dU m ,n(z), 



z>y 

the fourth term on the left-hand side of (A.2) can be simplified to 

*r rxS c {y)$ z > y z- l dU m , n {z) 



10 



o I z > y z-idG(z) 



dy 



x^dGix) 



(A.3) 



/ / x~ l dG{x) 

Jo Uy 



Sc{y)(d/dy)(y J z > y z 2 U m , n (z)dz) 

fly) 



dy 



\ /(*) 



\Jy<z 



f(y) 



Sc{y)d[y J z 2 U m ^ n {z)dz S j 

)s c (y) 



dz ) d 



(m-i 

v/(y) 
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Substituting (A. 3) into (A. 2), we obtain 

U m .n(z) 

P 



f>-V j T ,,, , i -p , 



PJo 



z>y 



(A.4) + 



1 — p 

l-p Jo 



tVrx 



Sc(y)dy 



W m>n (t) +p 



l-p" ~ W y/p(l-p) 

Using (2.3) and (2.4), one can simplify (A.4) further to the form 

'p-p^ 



1 — p 



+ 



p — p 
1 — p 



(A.5) 



^ldU m . n (x) 

lo<x<t g{x) 



JO 

W m M) +p 



^4^dz 

0<y<t \Jy<z Z 



(fit) 



P 



l-p 



(G*(t)-G(t)) 



V/(y) 

Vkjp-p) 

VpO--p) 



My) 



□ 
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